HEAD
An overall question we would like to investigate in next weeks lab is: what trends are there in animals leaving and entering a humane society and what factors lead to more animals getting adopted?
For our domain expert, we are going to be doing this research for the Austin Animal Shelter in Texas. By analyzing their data, we can help them to determine what factors lead to higher rates of adoption of their animals. We can also see which factors may be applicable to other animal shelters across the country. By discovering the trends present in this data set they will be able to change policy and practices to hopefully increase adoption rates and possibly decrease instances of euthanasia.
The data we will use to answer our questions is a dataset from Kaggle that gives information about thousands of animals admitted to the Austin Animal Center Shelter. The data set describes each animal with details including its age, gender, and condition, and outcome of the animal. This dataset can be found at the following URL: https://www.kaggle.com/aaronschlegel/austin-animal-center-shelter-outcomes-and
shelterdata3<-select(shelter_data33, -animal_id, -color, -monthyear, -name, -X13, -X14, -X15, -X16, -X17, -X18)
shelterdata3<- shelterdata3 %>% separate(sex_upon_outcome, into=c("Status","Sex"),sep=" ", fill="left")
shelterdata3<- shelterdata3 %>% separate(datetime, into=c("Year","Month"),sep="-",extra="drop")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [31].
shelterdata3<-shelterdata3[-31,]
shelterdata3$Year<-parse_integer(shelterdata3$Year)
shelterdata3$Month<-parse_integer(shelterdata3$Month)
shelterdata4<-as.tibble(select(shelterdata3, age_upon_outcome,animal_type,outcome_type,Sex))
testing6<-as.tibble(filter(shelterdata4,age_upon_outcome!="NULL",outcome_type!="Died",outcome_type!="Missing",outcome_type!="Rto-Adopt",outcome_type!="NA"))
testing7<-testing6%>%
separate(age_upon_outcome,c("age_num","age_unit")," ")
testing8<-as.tibble(filter(testing7,age_unit!="day",age_unit!="days",age_unit!="month",age_unit!="week",age_unit!="year",age_unit!="weeks"))
testing9<-as.tibble(filter(testing8,outcome_type=="Adoption"))
ggplot(data = testing9,mapping=aes(x=animal_type),na.rm=TRUE)+geom_bar()+facet_wrap(~age_unit)
As we can see from the plot above, one of the major factors influencing adoption rates is age, dependent on species. For cats, there are more adoptions when they are younger, while there are conversely more adoptions for older dogs. This could be due to the fact that older dogs are more likely to be perceived as mild, and could have been trained by previous owners to be well-behaved. Since cats tend to wander outside and mate, there could be a correlation between young cats and adoption simply because there may be more young cats born on the street that become strays. Either way, we see a clear trend to adopt more younger cats and older dogs.
animals_train = read_csv("~/Downloads/train.csv")
animals_test = read_csv("~/Downloads/test.csv")
animals_train = animals_train[,-1]
animals_test = animals_test[,-1]
animals <- bind_rows(animals_train, animals_test)
animals$Name[is.na(animals$Name)] <- "Unknown"
animals$HasName <- ifelse(animals$Name == 'Unknown', 0,
ifelse(animals$Name != 'Unknown', 1, NA))
animals <- animals %>% mutate(HasName = as.factor(HasName))
animals %>% ggplot(aes(DateTime, fill=HasName)) +
geom_density(alpha=0.5) +
facet_wrap(~AnimalType)
In this plot, we found that cats go missing from there home significantly more often in the summer than any other time of year, while the rate of dogs coming to the shelter are relatively constant throughout the year. This can probably be attributed to the fact that people may have windows and doors open more often in their homes during the summer than any other time of year, making it easier for cats to escape and explore beyond their boundaries. This means that every summer, there is generally a larger influx of cats admitted to the animal shelter. The shelter could use this information to make sure they are equipped to handle more cats in the summer than the rest of the year; on the other hand, they can expect to have a relatively consistent number of dogs in the shelter.
library(scales)
ggplot(animals %>% subset(OutcomeType != "Unknown") %>%
count(HasName, OutcomeType) %>%
mutate(pct=n/sum(n)),
aes(HasName, n, fill=OutcomeType)) +
geom_bar(position = "fill", stat="identity") +
scale_y_continuous(labels = percent_format())
Here we can see that animals that have a known name fare far better than ones with an unknown name. Animals with an unknown name are far less likely to be adopted or returned to their owner and far more likely to die or be euthanised. This is important for animal owners to note, making sure your pet is identifiebly taged will greatly increase your chances of you pet being returned to you should you lose it.
Do different types of animals have different outcomes of entering the shelter, and how do different kinds of animals have different abilities to adapt? Since we can investigate this problem from age and sex and different outcomes and give consumers information for what they want to breed.
Does classifcation of cats help them get better service? For example, we could classify different kinds of cat by ages. The cats with younger age could adopt much more quickly than adult cats, and younger cats could have more convenience service than others. For example like therapy
library(tidyverse)
mydata<-read.csv('aac_shelter_outcomes.csv')
mydata<-mydata[,1:12]
mydata %>%group_by(animal_type,outcome_type)%>%count()
## # A tibble: 37 x 3
## # Groups: animal_type, outcome_type [37]
## animal_type outcome_type n
## <fct> <fct> <int>
## 1 Bird Adoption 111
## 2 Bird Died 4
## 3 Bird Disposal 22
## 4 Bird Euthanasia 82
## 5 Bird Missing 1
## 6 Bird Relocate 7
## 7 Bird Return to Owner 9
## 8 Bird Transfer 76
## 9 Cat "" 2
## 10 Cat Adoption 11621
## # ... with 27 more rows
library(ggplot2)
thedata<-subset(mydata,select =c(age_upon_outcome,animal_type,outcome_type))
thedata<-subset(thedata,thedata$outcome_type!="")
thedata$age_upon_outcome<-as.character(thedata$age_upon_outcome)
thedata$age_type<-(strsplit(thedata$age_upon_outcome," "))
thedata$age_type<-as.character(lapply(thedata$age_type,function(x){unlist(x)[2]}))
cat_data<-subset(thedata,thedata$animal_type=="Cat")
cat_data<-na.omit(cat_data)
ggplot(data=thedata,aes(x=animal_type,fill=factor(outcome_type)))+geom_bar(position="fill")+labs(title = "outcomes of entering the shelter for different types of animals", x = "types of animals", y = "percentage")
ggplot(data=cat_data,aes(x=age_type,fill=factor(outcome_type)))+geom_bar(position="fill")+labs(title = "outcomes of entering the shelter for different age of cats", x = "types of age", y = "percentage")
For the first question, we can see that the cat and dog have bigger percentage to be adopt, and the cat has bigger percentage to be adopt in the months of age, also we could find that cat has the smallest percentage to be adopt in the days of the month. Therefore, we need to care more about younger cats.
My primary concern is how breed of dog/cats will affect the sex_upon_outcome of animals that enter into the Austin Animal center. We are curious about whether the sex_upon_outcome has a significant different when breed varies.
Then, we will explore which breeds of animals are more likely to enter into the Austin Animal center.
We divide our result as two parts, one deals with dogs and the other deals with cats.
## # A tibble: 14 x 4
## breed sex_upon_outcome freq pecentage
## <chr> <chr> <int> <dbl>
## 1 Labrador Retriever Mix Intact Male 409 0.00924
## 2 Australian Cattle Dog Mix Spayed Female 413 0.00934
## 3 Chihuahua Shorthair Mix Intact Female 443 0.0100
## 4 Chihuahua Shorthair Mix Intact Male 466 0.0105
## 5 German Shepherd Mix Spayed Female 649 0.0147
## 6 Pit Bull Mix Intact Female 682 0.0154
## 7 German Shepherd Mix Neutered Male 699 0.0158
## 8 Pit Bull Mix Intact Male 804 0.0182
## 9 Chihuahua Shorthair Mix Spayed Female 1499 0.0339
## 10 Labrador Retriever Mix Spayed Female 1583 0.0358
## 11 Chihuahua Shorthair Mix Neutered Male 1789 0.0404
## 12 Labrador Retriever Mix Neutered Male 1814 0.0410
## 13 Pit Bull Mix Spayed Female 1920 0.0434
## 14 Pit Bull Mix Neutered Male 2140 0.0484
After running the R script, we find that the Pit Bull Mix is of most often spayed or neutered: 4.82% is spayed female, and 5.32% is neutered.
## # A tibble: 11 x 2
## breed freq
## <chr> <int>
## 1 Rat Terrier Mix 412
## 2 Catahoula Mix 423
## 3 Miniature Poodle Mix 564
## 4 Border Collie Mix 574
## 5 Boxer Mix 617
## 6 Dachshund Mix 698
## 7 Australian Cattle Dog Mix 938
## 8 German Shepherd Mix 1724
## 9 Chihuahua Shorthair Mix 4223
## 10 Labrador Retriever Mix 4224
## 11 Pit Bull Mix 5562
From the analysis above, we find that Pit Bull Mix, Chihuahua Shorthair Mix and Labrador Retriever Mix are among the most breeds in the Austin Animal center.
For cats,
## # A tibble: 179 x 3
## breed sex_upon_outcome freq
## <chr> <chr> <int>
## 1 Domestic Medium Hair Mix Neutered Male 657
## 2 Domestic Shorthair Mix Intact Female 3381
## 3 Domestic Shorthair Mix Neutered Male 6109
## 4 Domestic Shorthair Mix Unknown 2070
## 5 <NA> <NA> NA
## 6 <NA> <NA> NA
## 7 <NA> <NA> NA
## 8 <NA> <NA> NA
## 9 <NA> <NA> NA
## 10 <NA> <NA> NA
## # ... with 169 more rows
Again, we find that the cat Domestic Shorthair Mix are among the most neutered or spayed.
## breed freq
## 3 Domestic Medium Hair 119
## 8 Snowshoe Mix 122
## 1 American Shorthair Mix 209
## 5 Domestic Shorthair 348
## 7 Siamese Mix 888
## 2 Domestic Longhair Mix 1101
## 4 Domestic Medium Hair Mix 2100
## 6 Domestic Shorthair Mix 20996
From the analysis above, we know that Domestic Shorthair Mix, Domestic Medium Hair Mix and American Shorthair Mix are most adopted in the center.
We find that Pit Bull Mix and Domestic shorthair Mix are adopted by the Austin Animal Center mostly. And they are spayed or neutered more than other breeds. Dogs and cats are our friends, they are cute. Everyone that has pets with dogs/cats to treat their pets wisely.
animals_train = read_csv("~/Downloads/train.csv", suppressMessages(TRUE))
animals_test = read_csv("~/Downloads/test.csv", suppressMessages(TRUE))
# Drop 'ID' column (first column)
animals_train = animals_train[,-1]
animals_test = animals_test[,-1]
# merge train and test data together
animals <- bind_rows(animals_train, animals_test)
# This converts the AgeuponOutcome into number of days
animals$TimeValue <- sapply(animals$AgeuponOutcome,
function(x) strsplit(x, split = ' ')[[1]][1])
animals$UnitofTime <- sapply(animals$AgeuponOutcome,
function(x) strsplit(x, split = ' ')[[1]][2])
animals$UnitofTime <- gsub('s', '', animals$UnitofTime)
animals$TimeValue <- as.numeric(animals$TimeValue)
animals$UnitofTime <- as.factor(animals$UnitofTime)
multiplier <- ifelse(animals$UnitofTime == 'day', 1,
ifelse(animals$UnitofTime == 'week', 7,
ifelse(animals$UnitofTime == 'month', 30,
ifelse(animals$UnitofTime == 'year', 365, NA))))
animals$AgeinDays <- animals$TimeValue * multiplier
# drop now unnessesary columns
animals <- subset(animals, select=-c(TimeValue, UnitofTime, AgeuponOutcome))
# fill in some NAs in the data
animals$Name[is.na(animals$Name)] <- "Unknown"
animals$SexuponOutcome[is.na(animals$SexuponOutcome)] <- "Unknown"
animals$OutcomeType[is.na(animals$OutcomeType)] <- "Unknown"
animals$OutcomeSubtype[is.na(animals$OutcomeSubtype)] <- "Unknown"
# create a new value based on whether the name is known
animals$HasName <- ifelse(animals$Name == 'Unknown', 0,
ifelse(animals$Name != 'Unknown', 1, NA))
# lastly convert to factors
animals <- animals %>% mutate(Name = as.factor(Name),
OutcomeType = as.factor(OutcomeType),
OutcomeSubtype = as.factor(OutcomeSubtype),
AnimalType = as.factor(AnimalType),
SexuponOutcome = as.factor(SexuponOutcome),
Breed = as.factor(Breed),
Color = as.factor(Color),
HasName = as.factor(HasName))
animals %>% subset(OutcomeType != "Unknown" & OutcomeType != "Transfer" & AnimalType == "Cat") %>% ggplot(aes(DateTime, fill=OutcomeType)) +
geom_density(alpha=0.25)
animals %>% subset(AnimalType == "Cat") %>% ggplot(aes(DateTime, AgeinDays)) +
geom_point(alpha = 0.01) +
scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 23 rows containing missing values (geom_point).
I wanted to investigate the odd behavior of cats in the datset, particularly the large increase in cats entering the shelter in the summer compared to the the winter. Especially becuase dogs don’t show these seasonal variations. In the first plot I looked at the distribution of outcomes over time. While return to owner remains relatively constant, we see large spikes in the other three outcomes during the summer months. Especially deaths in the summer of 2015. A quick google search found that Texas was undergoing a large heatwave in July 2015 and I believe many more cats may have dies to heat exposure.
I had two theories as to why cat may turn up more in the shelter in the summer. The first was that owners may be leaving their doors and windows open and cats would escape the house, or was this due to the breading cycle of the cats, which typically give birth in the spring. To do this I plotted the age of cats over time. The horizontal banding is due to the coversion of discrete time increments to linear. The darkest line (and therefore most frequent) is right at the 3 month or 90 day mark. Additionally we see distinct gaps during the winter of young cats, while older cats do not display these same gaps. Based on this I believe the spikes in cats are due to outdoor or feral cats breeding each year. This is a known problem and many states and contries have passed laws requiring neutering of cats to pevent rampant breeding.
To begin tidying the data, I first removed columns I didn’t feel were necessary for our analysis, including variables like the animal ID and the name of the animal. I then separated the sex and status of whether the animal is neutered or not from one column to two. I then made the datetime column more useful by only including the month and the year and separting the two so that they could be examined separately. The code I used to tidy the data can be found below:
shelterdata1<-select(shelter_data, -animal_id, -color, -monthyear, -name, -X13, -X14, -X15, -X16, -X17, -X18)
shelterdata1<- shelterdata1 %>% separate(sex_upon_outcome, into=c("Status","Sex"),sep=" ", fill="left")
shelterdata1<- shelterdata1 %>% separate(datetime, into=c("Year","Month"),sep="-",extra="drop")
shelterdata1<-shelterdata1[-31,]
shelterdata1$Year<-parse_integer(shelterdata1$Year)
shelterdata1$Month<-parse_integer(shelterdata1$Month)
I began by investigating different patterns between years at the animal shelter. First, I analyzed the spread of animals through various years in regards to their spayed and neutered status. The spread seems to have remained constant over the past few years, though it is interesting to note that the number of total animals reported increased significantly between 2013 and 2015. This suggests that the animal shelter has expanded and has made changes that allow them to take in more animals
Next, I examined patterns in the various outcomes between each type of animal for each year in the dataset. The first plot shows the outcome type per year, and the second shows the outcome subtype. While many of the outcomes seem to be relatively constant throughout the years, some things that stand out are that there is a massive spike in rto-adoptions for dogs in 2017. The animal might want to examine what changes occurred in 2017 that caused this and continue it in the following years to initiate more rto-adoptions. A similar spike occurs in cats for the same year, though not as large.
Finally, I examined patterns between months for different types of outcomes in different animals. The most notable trend amongst all this is that there seems to be a pattern between birds, cats, and dogs of larger amount of disposal in the beginning of the year.Overall, the transfer program and adoption rates seem consistent on a monthly basis, so the shelter seems to be doing a good job with that.
To analyze our overall question of which factors lead to animals being more or less likely to be adopted, I decided to analyze how age affects adoption rates. If we can identify patterns about what aged animals are more and less likely to be adopted, then we can make broader conclusions about the factors that influence both adoption rates, and the likely outcome of an animal coming into the shelter.
First, I seperated the age_upon_outcome column to be able to separate among the age levesl of weeks, months, and years. There seems to be an interesting trend among the adoption levels of animal species among the different age categories. For example, among all animal types none were adopted when they were less than 1 month old. Among birds, there were roughly equal amounts of adoptions among months old and years old birds. Among Other types there was a slightly larger amount of animals being adopted at months old and not years old. There was an increase in adoption of older Dogs compared with younger dogs. While there seemed to be a decrease in adoption of cats as they were older.
This implies that among animals entering shelters, they would be more likely to be adopted younger if they are cats, while they are more likely to be adopted older if they are dogs. We can also analyze to see the general trends among species in outcomes.
From the plot above we can see that among birds, there is no trend in outcome. Thus for birds we cannot make accurate predictions on their eventual outcomes once they have entered a shelter. Among Cats, their likely outcomes upon entering the shelter are being adopted followed by transfering to a new shelter. Dogs are most likely to be adopted, followed by being returned to their owners, and then being transfered. Livestock only seem to be transfered or adopted, but never euthanized. Other types of animals are most likely to be euthanized. This illustrates that species is a big factor in the eventual outcome of the animal entering into the shelter.
Sarah: In order to tidy the dataset, I primarily used select separate, and the parse function. I used these to select important columns, break up columns into multiple variables, and turn them into a type that would be useful to use. I plotted many different variables in the dataset over different years and months to look for patterns amongst them so the shelter might be able to reflect on what they did in those years or months that would have caused these patterns, particularly among the outcomes of animals.
David: In this report, I firstly count the number of the different animals with the different outcomes using group_by and count in tidyverse. Then we select variables and we use and transform the age variables to category variables including day,days,month,months,week,weeks,year,years to represent the different age. Finally we create plots of the variables to answer the question.
Derek: I tidyied the dataset by droping uneccessary columns, appropriately converting columns to factors, I fixed NA values by filling in an “Unknown” value. Lastly I created a HasName variable and converted the age of the animal into days. This let me compare ages as a linear values instead of factors. I then looks at whether the absence of a known name affected animal outcome, and then I looked at some of the pecularities of cats during the summer.
Abby: I tidied some of the data Sarah had previously tidied in order to do my analysis. I primarly did this through separate in order to get an age group column to use in my comparisons. I then created a plot to show the adoption counts for each animal type among the different age groups. This allowed me to see how age affected the adoption rates of different species. Finally, I created a plot showing the counts of different outcomes for each animal species. This allowed me to draw more conclusions about the likely outcome of an animal when they first enter the shelter.
Will: For this report, I used the as.tibble,count and order to figure out the breed of cat and dog. In order to calculate how many dogs and cats entering into the shelter in different species. Finally I created the frequency of the different breed of gods that entering into the animal shelter.
From the graph above we can see that, on average, rainfall has decrease for the variables rainfall, charcoal, and BSi. However, for the variables BSi.mar, n, and TEX86 there seems to have been neither an increase nor a decrease in the rainfall. The variable for average rainfall seems to fluctuate, but has generally been decreasing, as the years increase.
An overall question we would like to investigate in next weeks lab is: what trends are there in animals leaving and entering a humane society and what factors lead to more animals getting adopted?
For our domain expert, we are going to be doing this research for the Austin Animal Shelter in Texas. By analyzing their data, we can help them to determine what factors lead to higher rates of adoption of their animals. We can also see which factors may be applicable to other animal shelters across the country. By discovering the trends present in this data set they will be able to change policy and practices to hopefully increase adoption rates and possibly decrease instances of euthanasia.
The data we will use to answer our questions is a dataset from Kaggle that gives information about thousands of animals admitted to the Austin Animal Center Shelter. The data set describes each animal with details including its age, gender, and condition, and outcome of the animal. This dataset can be found at the following URL: https://www.kaggle.com/aaronschlegel/austin-animal-center-shelter-outcomes-and
shelterdata3<-select(shelter_data33, -animal_id, -color, -monthyear, -name, -X13, -X14, -X15, -X16, -X17, -X18)
shelterdata3<- shelterdata3 %>% separate(sex_upon_outcome, into=c("Status","Sex"),sep=" ", fill="left")
shelterdata3<- shelterdata3 %>% separate(datetime, into=c("Year","Month"),sep="-",extra="drop")
## Warning: Too few values at 1 locations: 31
shelterdata3<-shelterdata3[-31,]
shelterdata3$Year<-parse_integer(shelterdata3$Year)
shelterdata3$Month<-parse_integer(shelterdata3$Month)
shelterdata4<-as.tibble(select(shelterdata3, age_upon_outcome,animal_type,outcome_type,Sex))
testing6<-as.tibble(filter(shelterdata4,age_upon_outcome!="NULL",outcome_type!="Died",outcome_type!="Missing",outcome_type!="Rto-Adopt",outcome_type!="NA"))
## Warning: package 'bindrcpp' was built under R version 3.3.3
testing7<-testing6%>%
separate(age_upon_outcome,c("age_num","age_unit")," ")
testing8<-as.tibble(filter(testing7,age_unit!="day",age_unit!="days",age_unit!="month",age_unit!="week",age_unit!="year",age_unit!="weeks"))
testing9<-as.tibble(filter(testing8,outcome_type=="Adoption"))
ggplot(data = testing9,mapping=aes(x=animal_type),na.rm=TRUE)+geom_bar()+facet_wrap(~age_unit)
As we can see from the plot above, one of the major factors influencing adoption rates is age, dependent on species. For cats, there are more adoptions when they are younger, while there are conversely more adoptions for older dogs. This could be due to the fact that older dogs are more likely to be perceived as mild, and could have been trained by previous owners to be well-behaved. Since cats tend to wander outside and mate, there could be a correlation between young cats and adoption simply because there may be more young cats born on the street that become strays. Either way, we see a clear trend to adopt more younger cats and older dogs.
In this plot, we found that cats go missing from there home significantly more often in the summer than any other time of year, while the rate of dogs coming to the shelter are relatively constant throughout the year. This can probably be attributed to the fact that people may have windows and doors open more often in their homes during the summer than any other time of year, making it easier for cats to escape and explore beyond their boundaries. This means that every summer, there is generally a larger influx of cats admitted to the animal shelter. The shelter could use this information to make sure they are equipped to handle more cats in the summer than the rest of the year; on the other hand, they can expect to have a relatively consistent number of dogs in the shelter.
Do different types of animals have different outcomes of entering the shelter, and how do different kinds of animals have different abilities to adapt? Since we can investigate this problem from age and sex and different outcomes and give consumers information for what they want to breed.
Does classifcation of cats help them get better service? For example, we could classify different kinds of cat by ages. The cats with younger age could adopt much more quickly than adult cats, and younger cats could have more convenience service than others. For example like therapy
library(tidyverse)
mydata<-read.csv('aac_shelter_outcomes.csv')
mydata<-mydata[,1:12]
mydata %>%group_by(animal_type,outcome_type)%>%count()
## # A tibble: 37 x 3
## # Groups: animal_type, outcome_type [37]
## animal_type outcome_type n
## <fct> <fct> <int>
## 1 Bird Adoption 111
## 2 Bird Died 4
## 3 Bird Disposal 22
## 4 Bird Euthanasia 82
## 5 Bird Missing 1
## 6 Bird Relocate 7
## 7 Bird Return to Owner 9
## 8 Bird Transfer 76
## 9 Cat "" 2
## 10 Cat Adoption 11621
## # ... with 27 more rows
library(ggplot2)
thedata<-subset(mydata,select =c(age_upon_outcome,animal_type,outcome_type))
thedata<-subset(thedata,thedata$outcome_type!="")
thedata$age_upon_outcome<-as.character(thedata$age_upon_outcome)
thedata$age_type<-(strsplit(thedata$age_upon_outcome," "))
thedata$age_type<-as.character(lapply(thedata$age_type,function(x){unlist(x)[2]}))
cat_data<-subset(thedata,thedata$animal_type=="Cat")
cat_data<-na.omit(cat_data)
ggplot(data=thedata,aes(x=animal_type,fill=factor(outcome_type)))+geom_bar(position="fill")+labs(title = "outcomes of entering the shelter for different types of animals", x = "types of animals", y = "percentage")
ggplot(data=cat_data,aes(x=age_type,fill=factor(outcome_type)))+geom_bar(position="fill")+labs(title = "outcomes of entering the shelter for different age of cats", x = "types of age", y = "percentage")
For the first question, we can see that the cat and dog have bigger percentage to be adopt, and the cat has bigger percentage to be adopt in the months of age, also we could find that cat has the smallest percentage to be adopt in the days of the month. Therefore, we need to care more about younger cats.
My primary concern is how breed of dog/cats will affect the sex_upon_outcome of animals that enter into the Austin Animal center. We are curious about whether the sex_upon_outcome has a significant different when breed varies.
Then, we will explore which breeds of animals are more likely to enter into the Austin Animal center.
We divide our result as two parts, one deals with dogs and the other deals with cats.
## # A tibble: 14 x 4
## breed sex_upon_outcome freq pecentage
## <chr> <chr> <int> <dbl>
## 1 Labrador Retriever Mix Intact Male 409 0.00924
## 2 Australian Cattle Dog Mix Spayed Female 413 0.00934
## 3 Chihuahua Shorthair Mix Intact Female 443 0.0100
## 4 Chihuahua Shorthair Mix Intact Male 466 0.0105
## 5 German Shepherd Mix Spayed Female 649 0.0147
## 6 Pit Bull Mix Intact Female 682 0.0154
## 7 German Shepherd Mix Neutered Male 699 0.0158
## 8 Pit Bull Mix Intact Male 804 0.0182
## 9 Chihuahua Shorthair Mix Spayed Female 1499 0.0339
## 10 Labrador Retriever Mix Spayed Female 1583 0.0358
## 11 Chihuahua Shorthair Mix Neutered Male 1789 0.0404
## 12 Labrador Retriever Mix Neutered Male 1814 0.0410
## 13 Pit Bull Mix Spayed Female 1920 0.0434
## 14 Pit Bull Mix Neutered Male 2140 0.0484
After running the R script, we find that the Pit Bull Mix is of most often spayed or neutered: 4.82% is spayed female, and 5.32% is neutered.
## # A tibble: 11 x 2
## breed freq
## <chr> <int>
## 1 Rat Terrier Mix 412
## 2 Catahoula Mix 423
## 3 Miniature Poodle Mix 564
## 4 Border Collie Mix 574
## 5 Boxer Mix 617
## 6 Dachshund Mix 698
## 7 Australian Cattle Dog Mix 938
## 8 German Shepherd Mix 1724
## 9 Chihuahua Shorthair Mix 4223
## 10 Labrador Retriever Mix 4224
## 11 Pit Bull Mix 5562
From the analysis above, we find that Pit Bull Mix, Chihuahua Shorthair Mix and Labrador Retriever Mix are among the most breeds in the Austin Animal center.
For cats,
## # A tibble: 179 x 3
## breed sex_upon_outcome freq
## <chr> <chr> <int>
## 1 Domestic Medium Hair Mix Neutered Male 657
## 2 Domestic Shorthair Mix Intact Female 3381
## 3 Domestic Shorthair Mix Neutered Male 6109
## 4 Domestic Shorthair Mix Unknown 2070
## 5 <NA> <NA> NA
## 6 <NA> <NA> NA
## 7 <NA> <NA> NA
## 8 <NA> <NA> NA
## 9 <NA> <NA> NA
## 10 <NA> <NA> NA
## # ... with 169 more rows
Again, we find that the cat Domestic Shorthair Mix are among the most neutered or spayed.
## breed freq
## 3 Domestic Medium Hair 119
## 8 Snowshoe Mix 122
## 1 American Shorthair Mix 209
## 5 Domestic Shorthair 348
## 7 Siamese Mix 888
## 2 Domestic Longhair Mix 1101
## 4 Domestic Medium Hair Mix 2100
## 6 Domestic Shorthair Mix 20996
From the analysis above, we know that Domestic Shorthair Mix, Domestic Medium Hair Mix and American Shorthair Mix are most adopted in the center.
We find that Pit Bull Mix and Domestic shorthair Mix are adopted by the Austin Animal Center mostly. And they are spayed or neutered more than other breeds. Dogs and cats are our friends, they are cute. Everyone that has pets with dogs/cats to treat their pets wisely.
To begin tidying the data, I first removed columns I didn’t feel were necessary for our analysis, including variables like the animal ID and the name of the animal. I then separated the sex and status of whether the animal is neutered or not from one column to two. I then made the datetime column more useful by only including the month and the year and separting the two so that they could be examined separately. The code I used to tidy the data can be found below:
shelterdata1<-select(shelter_data, -animal_id, -color, -monthyear, -name, -X13, -X14, -X15, -X16, -X17, -X18)
shelterdata1<- shelterdata1 %>% separate(sex_upon_outcome, into=c("Status","Sex"),sep=" ", fill="left")
shelterdata1<- shelterdata1 %>% separate(datetime, into=c("Year","Month"),sep="-",extra="drop")
shelterdata1<-shelterdata1[-31,]
shelterdata1$Year<-parse_integer(shelterdata1$Year)
shelterdata1$Month<-parse_integer(shelterdata1$Month)
I began by investigating different patterns between years at the animal shelter. First, I analyzed the spread of animals through various years in regards to their spayed and neutered status. The spread seems to have remained constant over the past few years, though it is interesting to note that the number of total animals reported increased significantly between 2013 and 2015. This suggests that the animal shelter has expanded and has made changes that allow them to take in more animals
Next, I examined patterns in the various outcomes between each type of animal for each year in the dataset. The first plot shows the outcome type per year, and the second shows the outcome subtype. While many of the outcomes seem to be relatively constant throughout the years, some things that stand out are that there is a massive spike in rto-adoptions for dogs in 2017. The animal might want to examine what changes occurred in 2017 that caused this and continue it in the following years to initiate more rto-adoptions. A similar spike occurs in cats for the same year, though not as large.
Finally, I examined patterns between months for different types of outcomes in different animals. The most notable trend amongst all this is that there seems to be a pattern between birds, cats, and dogs of larger amount of disposal in the beginning of the year.Overall, the transfer program and adoption rates seem consistent on a monthly basis, so the shelter seems to be doing a good job with that.
To analyze our overall question of which factors lead to animals being more or less likely to be adopted, I decided to analyze how age affects adoption rates. If we can identify patterns about what aged animals are more and less likely to be adopted, then we can make broader conclusions about the factors that influence both adoption rates, and the likely outcome of an animal coming into the shelter.
First, I seperated the age_upon_outcome column to be able to separate among the age levesl of weeks, months, and years. There seems to be an interesting trend among the adoption levels of animal species among the different age categories. For example, among all animal types none were adopted when they were less than 1 month old. Among birds, there were roughly equal amounts of adoptions among months old and years old birds. Among Other types there was a slightly larger amount of animals being adopted at months old and not years old. There was an increase in adoption of older Dogs compared with younger dogs. While there seemed to be a decrease in adoption of cats as they were older.
This implies that among animals entering shelters, they would be more likely to be adopted younger if they are cats, while they are more likely to be adopted older if they are dogs. We can also analyze to see the general trends among species in outcomes.
From the plot above we can see that among birds, there is no trend in outcome. Thus for birds we cannot make accurate predictions on their eventual outcomes once they have entered a shelter. Among Cats, their likely outcomes upon entering the shelter are being adopted followed by transfering to a new shelter. Dogs are most likely to be adopted, followed by being returned to their owners, and then being transfered. Livestock only seem to be transfered or adopted, but never euthanized. Other types of animals are most likely to be euthanized. This illustrates that species is a big factor in the eventual outcome of the animal entering into the shelter.
Sarah: In order to tidy the dataset, I primarily used select separate, and the parse function. I used these to select important columns, break up columns into multiple variables, and turn them into a type that would be useful to use. I plotted many different variables in the dataset over different years and months to look for patterns amongst them so the shelter might be able to reflect on what they did in those years or months that would have caused these patterns, particularly among the outcomes of animals.
David: In this report, I firstly count the number of the different animals with the different outcomes using group_by and count in tidyverse. Then we select variables and we use and transform the age variables to category variables including day,days,month,months,week,weeks,year,years to represent the different age. Finally we create plots of the variables to answer the question.
Abby: I tidied some of the data Sarah had previously tidied in order to do my analysis. I primarly did this through separate in order to get an age group column to use in my comparisons. I then created a plot to show the adoption counts for each animal type among the different age groups. This allowed me to see how age affected the adoption rates of different species. Finally, I created a plot showing the counts of different outcomes for each animal species. This allowed me to draw more conclusions about the likely outcome of an animal when they first enter the shelter.
From the graph above we can see that, on average, rainfall has decrease for the variables rainfall, charcoal, and BSi. However, for the variables BSi.mar, n, and TEX86 there seems to have been neither an increase nor a decrease in the rainfall. The variable for average rainfall seems to fluctuate, but has generally been decreasing, as the years increase.